DeTEXT: Programming by Example
نویسندگان
چکیده
Data cleaning and manipulation is an expensive and tedious process that consumes valuable time and resources. If an end-user needs to manipulate a large amount of data, he has three primary options: record a macro, write a transformation program, or perform manual manipulations on the data. Recording a macro with a fixed series of actions can be simple, but does not generalize well and is not sufficiently expressive for many tasks. Writing a transformation program requires expert knowledge unavailable to many end-users. And depending on the quantity of data, it may not even be possible to perform manual manipulations. Programming by demonstration (PbD) offers an intuitive alternative to these approaches. In a PbD setting, an end-user demonstrates the actions required to transform input examples into output examples. A machine learning algorithm observes these transformation sequences, called traces, with the goal of inducing a program that transforms all future examples in the desired manner. A line of successful research on PbD uses a formalism called version space algebra [7] to represent the space of all hypotheses (programs) that are consistent with the input-output examples. The version space algebra defines useful operators like union and join that enable construction of complex version spaces from simple ones. SmartEDIT [5] is a PbD system for learning text transformations. However, it is not always possible to observe traces. Observing traces requires tight integration with text editors, which come in many forms and often do not expose their source code for PbD integration. Furthermore, input-output examples may have been pre-recorded, as in a partially completed spreadsheet. Spreadsheet users, who often need to clean or manipulate data, number over 500 million people worldwide; there is a huge opportunity for time savings and increased productivity. The problem of inductively learning programs without traces is called programming by example (PbE). PbE is a strictly harder problem than PbD; the learning algorithm observes less information and thus must consider a much larger space of possible hypotheses. Our research has focused on lifting the PbD methods used by SmartEDIT to the PbE context. This work was conducted independently of [1], which also aims to solve this problem. In this work, we present DeTEXT, a PbE system that is able to save users time in a number of repetitive text-editing scenarios. While the rich language of text transformations we originally envisioned is beyond the scope of this project, DeTEXT learns simple editing operations from three or fewer examples. We make the following contributions:
منابع مشابه
DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures
Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes De...
متن کاملSolving Fully Fuzzy Linear Programming Problems with Zero-One Variables by Ranking Function
Jahanshahloo has suggested a method for the solving linear programming problems with zero-one variables. In this paper we formulate fully fuzzy linear programming problems with zero-one variables and a method for solving these problems is presented using the ranking function and also the branch and bound method along with an example is presented.
متن کاملA new solving approach for fuzzy multi-objective programming problem in uncertainty conditions by using semi-infinite linear programing
In practice, there are many problems which decision parameters are fuzzy numbers, and some kind of this problems are formulated as either possibilitic programming or multi-objective programming methods. In this paper, we consider a multi-objective programming problem with fuzzy data in constraints and introduce a new approach for solving these problems base on a combination of the multi-objecti...
متن کاملA New Approach to Solve Fully Fuzzy Linear Programming with Trapezoidal Numbers Using Conversion Functions
Recently, fuzzy linear programming problems have been considered by many. In the literature of fuzzy linear programming several models are offered and therefore some various methods have been suggested to solve these problems. One of the most important of these problems that recently has been considered; are Fully Fuzzy Linear Programming (FFLP), which all coefficients and variables of the prob...
متن کاملMathematical solution of multilevel fractional programming problem with fuzzy goal programming approach
In this paper, we show a procedure for solving multilevel fractional programming problems in a large hierarchical decentralized organization using fuzzy goal programming approach. In the proposed method, the tolerance membership functions for the fuzzily described numerator and denominator part of the objective functions of all levels as well as the control vectors of the higher level decision ...
متن کامل